Improving Text representations through Probabilistic Integration of Synonymy Relations

نویسندگان

  • Romaric Besançon
  • Jean-Cédric Chappelier
  • Martin Rajman
چکیده

The present contribution focuses on the integration of word senses in a vector representation of texts, using a probabilistic model. The vector representation under consideration is the DSIR model, that extends the standard Vector Space (VS) model by taking both occurrences and co-occurrences of words into account. Integration of word senses into the co-occurrence model is done using a Markov Random Field model with hidden variables, using semantic information derived from synonymy relations extracted from a synonym dictionary.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Readers’ Perceptions of Lexical Cohesion in Text

Preliminary results from an experimental study of readers’ perceptions of lexical cohesion and lexical semantic relations in text are presented. Readers agree on a common “core” of groups of related words and exhibit individual differences. The majority of relations reported are “non-classical” (not hyponymy, meronymy, synonymy, or antonymy). A group of commonly used relations is presented. The...

متن کامل

Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis

Probabilistic Latent Semantic Analysis (PLSA) models have been shown to provide a better model for capturing polysemy and synonymy than Latent Semantic Analysis (LSA). However, the parameters of a PLSA model are trained using the Expectation Maximization (EM) algorithm, and as a result, the trained model is dependent on the initialization values so that performance can be highly variable. In th...

متن کامل

Quantifying the Impact and Extent of Undocumented Biomedical Synonymy

Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships....

متن کامل

Sparse Overcomplete Word Vector Representations

Current distributed representations of words show little resemblance to theories of lexical semantics. The former are dense and uninterpretable, the latter largely based on familiar, discrete classes (e.g., supersenses) and relations (e.g., synonymy and hypernymy). We propose methods that transform word vectors into sparse (and optionally binary) vectors. The resulting representations are more ...

متن کامل

Mental Representations of Lyrical Prose

The article analyzes mental representations of Russian lyrical prose texts. The texts demonstrate collective memory engrams that are defined by cultural and historical legacy of the nation and authors’ creative world perception. In architectonics of a lyrical prose text, sense perception reveals itself in accumulated underlying meanings and wisdom conveyed by expressive means. The author’s inte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001